\section{Related Work}
%In this section, we will review the related studies in three aspects, namely recommender systems, heterogeneous information networks and neural attention mechanism.
In the literature of recommender systems, early works mainly adopt collaborative filtering (CF) methods (\eg matrix factorization) to utilize historical interactions for recommendation~\cite{koren2009matrix}. Since CF methods usually suffer from cold-start problem, many works attempt to leverage additional information for recommendation, such as social information~\cite{wang2016social,zhao2014we,zhao2016connecting}, location information~\cite{yin2013lcars}, and heterogeneous information~\cite{feng2012incorporating}.
In addition, there are also some general feature based frameworks for incorporating context information for recommendation~\cite{rendle2010factorization,chen2012svdfeature,he2017neural2}.
 Recently, deep network models are also employed to extract refined latent features~\cite{he2017neural} from the user-item interaction data. 


As a newly emerging direction, heterogeneous information network~\cite{shi2017survey} can naturally model complex objects and their rich relations in recommender systems, in which objects are of different types and links among objects represent different relations~\cite{sun2011pathsim,shi2014hetesim}. 
Due to the flexibility of  HIN in modeling  various kinds of heterogenous data, it has been adopted
  in recommender systems to model rich auxiliary data.
Most of HIN based methods usually utilize path based similarity to enhance the representations of users and items, including meta-path based latent features~\cite{yu2014personalized}, meta-path based user similarity~\cite{shi2015semantic,liu2017personalized} and 
meta-graph based latent features~\cite{zhao2017meta}.  However, they seldom learn explicit representation for path or meta-path tailored to the recommendation task
%Yu et al.~\cite{yu2014personalized} introduce meta-path based latent features to represent the connectivity between users and items. Shi et al.~\cite{shi2015semantic} propose the concept of weighted heterogeneous information network and employ the meta-path based similarity of users for personalized recommendation. Recently, Zhao et al.~\cite{zhao2017meta} propose a factorization machine based model integrated with meta-graph based similarity for recommendation. 

On the other hand, network embedding has shown its potential in structure feature extraction and has been successfully applied in many data mining tasks. For example, Deepwalk~\cite{perozzi2014deepwalk} and node2vec~\cite{grover2016node2vec} combine random walk and skip-gram to learn network representations. Most of network embedding methods focus on homogeneous networks, and thus they cannot directly be applied to heterogeneous networks. Recently, attention is increasingly shifting towards heterogeneous networks.  Xu et al.~\cite{xu2017embedding} propose an Embedding of Embedding model to encode the intra-network and inter-network edges for the coupled heterogeneous network. Dong et al.~\cite{dong2017metapath2vec} obtain the neighbors of a node via meta-paths and learn the HIN embedding by skip-gram with negative sampling. Furthermore, Fu et. al~\cite{Fu2017HIN2Vec} learn node embedding to capture rich relation semantics in HIN via neural network model. Although these HIN embedding methods has shown their effectiveness in some tasks, they usually focus on general node embeddings, seldom considering the path embedding for the recommendation task.

%As a newly emerging direction, heterogeneous information network~\cite{shi2017survey} can naturally model complex objects and their rich relations in recommender systems, in which objects are of different types and links among objects represent different relations~\cite{sun2013mining}. And several path-based similarity measures~\cite{sun2011pathsim,shi2014hetesim} are proposed to evaluate the similarity of objects in heterogeneous information network. Therefore, some researchers have began to be aware of the importance of HIN-based recommendation. Wang et al.~\cite{feng2012incorporating} propose the OptRank method to alleviate the cold-start problem by utilizing heterogeneous information contained in social tagging system. Furthermore, the concept of meta-path is introduced into hybrid recommender systems~\cite{yu2013recommendation}. Yu et al.~\cite{yu2013collaborative} utilize meta-path-based similarities as regularization terms in the matrix factorization framework. Yu et al.~\cite{yu2014personalized} take advantage of different types of entity relationships in heterogeneous information network and propose a personalized recommendation framework for implicit feedback dataset. Luo et al.~\cite{luo2014hete} propose a collaborative filtering based social recommendation method using heterogeneous relations. More recently, Shi et al.~\cite{shi2015semantic} propose the concept of weighted heterogeneous information network and design a meta-path based collaborative filtering model to flexibly integrate heterogeneous information for personalized recommendation. In \cite{shi2016integrating}, the similarities of users and items are both evaluated by path based similarity measures under different semantic meta-paths and a matrix factorization based on dual regularization framework is proposed for rating prediction. Most of HIN-based methods rely on the path based similarity, which may not fully mine latent features of users and items on HINs for recommendation.

Our work is inspired by the recent progress of neural attention mechanism in the fields of computer vision~\cite{xu2015show} and natural language processing~\cite{phan2017neupl}. In particular, co-attention or cross-attention mechanisms have been applied to solve complicated NLP tasks~\cite{hao2017end,xiong2016dynamic}.
We are also aware of the application of attention mechanism in recommender systems~\cite{chen2017attentive,wang2017dynamic,xiao2017attentional}.
We borrow the idea of co-attention mechanisms for modeling mutual effect between the meta-path and the involved user-item pair in an interaction.
To our knowledge, it is the first time that meta-path based context has been explicitly modeled in a three-way neural interaction model with the co-attention mechanism.
%, which has a totally different task goal.

% Its success is mainly due to the reasonable assumption that human recognition does not tend to process a whole signal in its entirety at once; instead, one only focuses on selective parts of the whole perception space when and where as needed.

%Recently, some co-attention mechanisms have been proposed to solve more complex interaction in NLP. For example, Hao et al.~\cite{hao2017end} propose a novel cross-attention to improve the representation of the question and answer for KB-QA. Xiong et al.~\cite{xiong2016dynamic} present a co-attention mechanism that attends to the question and document simultaneously and fuse both attention contexts for QA task. Inspired by these methods, our MCRec designs a hierarchical co-attention structure for recommendation tasks.


